Picture for Baolin Peng

Baolin Peng

EJ

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Add code
Apr 30, 2025
Viaarxiv icon

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Add code
Apr 29, 2025
Viaarxiv icon

Magma: A Foundation Model for Multimodal AI Agents

Add code
Feb 18, 2025
Viaarxiv icon

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

Add code
Feb 10, 2025
Figure 1 for On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Figure 2 for On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Figure 3 for On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Figure 4 for On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning

Add code
Oct 15, 2024
Figure 1 for Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning
Figure 2 for Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning
Figure 3 for Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning
Figure 4 for Teaching AI Agents to Search with Reflective-MCTS and Exploratory Learning
Viaarxiv icon

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

Add code
Oct 09, 2024
Figure 1 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 2 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 3 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Figure 4 for Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Viaarxiv icon

Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning

Add code
Oct 02, 2024
Figure 1 for Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Figure 2 for Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Figure 3 for Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Figure 4 for Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Viaarxiv icon

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Add code
Aug 28, 2024
Figure 1 for SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Figure 2 for SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Figure 3 for SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Figure 4 for SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Viaarxiv icon

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Add code
Jun 30, 2024
Figure 1 for Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Figure 2 for Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Figure 3 for Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Viaarxiv icon